Search CORE

25 research outputs found

Analyzing the Unanalyzable: an Application to Android Apps

Author: Samhi Jordan
Publication venue: Unilu - University of Luxembourg
Publication date: 09/01/2023
Field of study

In general, software is unreliable. Its behavior can deviate from users’ expectations because of bugs, vulnerabilities, or even malicious code. Manually vetting software is a challenging, tedious, and highly-costly task that does not scale. To alleviate excessive costs and analysts’ burdens, automated static analysis techniques have been proposed by both the research and practitioner communities making static analysis a central topic in software engineering. In the meantime, mobile apps have considerably grown in importance. Today, most humans carry software in their pockets, with the Android operating system leading the market. Millions of apps have been proposed to the public so far, targeting a wide range of activities such as games, health, banking, GPS, etc. Hence, Android apps collect and manipulate a considerable amount of sensitive information, which puts users’ security and privacy at risk. Consequently, it is paramount to ensure that apps distributed through public channels (e.g., the Google Play) are free from malicious code. Hence, the research and practitioner communities have put much effort into devising new automated techniques to vet Android apps against malicious activities over the last decade. Analyzing Android apps is, however, challenging. On the one hand, the Android framework proposes constructs that can be used to evade dynamic analysis by triggering the malicious code only under certain circumstances, e.g., if the device is not an emulator and is currently connected to power. Hence, dynamic analyses can -easily- be fooled by malicious developers by making some code fragments difficult to reach. On the other hand, static analyses are challenged by Android-specific constructs that limit the coverage of off-the-shell static analyzers. The research community has already addressed some of these constructs, including inter-component communication or lifecycle methods. However, other constructs, such as implicit calls (i.e., when the Android framework asynchronously triggers a method in the app code), make some app code fragments unreachable to the static analyzers, while these fragments are executed when the app is run. Altogether, many apps’ code parts are unanalyzable: they are either not reachable by dynamic analyses or not covered by static analyzers. In this manuscript, we describe our contributions to the research effort from two angles: ① statically detecting malicious code that is difficult to access to dynamic analyzers because they are triggered under specific circumstances; and ② statically analyzing code not accessible to existing static analyzers to improve the comprehensiveness of app analyses. More precisely, in Part I, we first present a replication study of a state-of-the-art static logic bomb detector to better show its limitations. We then introduce a novel hybrid approach for detecting suspicious hidden sensitive operations towards triaging logic bombs. We finally detail the construction of a dataset of Android apps automatically infected with logic bombs. In Part II, we present our work to improve the comprehensiveness of Android apps’ static analysis. More specifically, we first show how we contributed to account for atypical inter-component communication in Android apps. Then, we present a novel approach to unify both the bytecode and native in Android apps to account for the multi-language trend in app development. Finally, we present our work to resolve conditional implicit calls in Android apps to improve static and dynamic analyzers

Open Repository and Bibliography - Luxembourg

A Dataset of Android Libraries

Author: Alecci Marco
Bissyandé Tegawendé F.
Klein Jacques
Samhi Jordan
Publication venue
Publication date: 24/07/2023
Field of study

Android app developers extensively employ code reuse, integrating many third-party libraries into their apps. While such integration is practical for developers, it can be challenging for static analyzers to achieve scalability and precision when such libraries can account for a large part of the app code. As a direct consequence, when a static analysis is performed, it is common practice in the literature to only consider developer code --with the assumption that the sought issues are in developer code rather than in the libraries. However, analysts need to precisely distinguish between library code and developer code in Android apps to ensure the effectiveness of static analysis. Currently, many static analysis approaches rely on white lists of libraries. However, these white lists are unreliable, as they are inaccurate and largely non-comprehensive. In this paper, we propose a new approach to address the lack of comprehensive and automated solutions for the production of accurate and "always up to date" sets of third-party libraries. First, we demonstrate the continued need for a white list of third-party libraries. Second, we propose an automated approach to produce an accurate and up-to-date set of third-party libraries in the form of a dataset called AndroLibZoo. Our dataset, which we make available to the research community, contains to date 20 162 libraries and is meant to evolve. Third, we illustrate the significance of using AndroLibZoo to filter libraries in recent apps. Fourth, we demonstrate that AndroLibZoo is more suitable than the current state-of-the-art list for improved static analysis. Finally, we show how the use of AndroLibZoo can enhance the performance of existing Android app static analyzers

arXiv.org e-Print Archive

A First Look at Android Applications in Google Play related to Covid-19

Author: Allix Kevin
Bissyandé Tegawendé F.
Klein Jacques
Samhi Jordan
Publication venue
Publication date: 01/01/2021
Field of study

Due to the convenience of access-on-demand to information and business solutions, mobile apps have become an important asset in the digital world. In the context of the Covid-19 pandemic, app developers have joined the response effort in various ways by releasing apps that target different user bases (e.g., all citizens or journalists), offer different services (e.g., location tracking or diagnostic-aid), provide generic or specialized information, etc. While many apps have raised some concerns by spreading misinformation or even malware, the literature does not yet provide a clear landscape of the different apps that were developed. In this study, we focus on the Android ecosystem and investigate Covid-related Android apps. In a best-effort scenario, we attempt to systematically identify all relevant apps and study their characteristics with the objective to provide a First taxonomy of Covid-related apps, broadening the relevance beyond the implementation of contact tracing. Overall, our study yields a number of empirical insights that contribute to enlarge the knowledge on Covid-related apps: (1) Developer communities contributed rapidly to the Covid-19, with dedicated apps released as early as January 2020; (2) Covid-related apps deliver digital tools to users (e.g., health diaries), serve to broadcast information to users (e.g., spread statistics), and collect data from users (e.g., for tracing); (3) Covid-related apps are less complex than standard apps; (4) they generally do not seem to leak sensitive data; (5) in the majority of cases, Covid-related apps are released by entities with past experience on the market, mostly official government entities or public health organizations.Comment: Accepted in Empirical Software Engineering under reference: EMSE-D-20-00211R

arXiv.org e-Print Archive

Open Repository and Bibliography - Luxembourg

TriggerZoo: A Dataset of Android Applications Automatically Infected with Logic Bombs

Author: Bissyande Tegawendé François D Assise
Klein Jacques
Samhi Jordan
Publication venue
Publication date: 08/03/2022
Field of study

Many Android apps analyzers rely, among other techniques, on dynamic analysis to monitor their runtime behavior and detect potential security threats. However, malicious developers use subtle, though efficient, techniques to bypass dynamic analyzers. Logic bombs are examples of popular techniques where the malicious code is triggered only under specific circumstances, challenging comprehensive dynamic analyses. The research community has proposed various approaches and tools to detect logic bombs. Unfortunately, rigorous assessment and fair comparison of state-of-the-art techniques are impossible due to the lack of ground truth. In this paper, we present TriggerZoo, a new dataset of 406 Android apps containing logic bombs and benign trigger-based behavior that we release only to the research community using authenticated API. These apps are real-world apps from Google Play that have been automatically infected by our tool AndroBomb. The injected pieces of code implementing the logic bombs cover a large pallet of realistic logic bomb types that we have manually characterized from a set of real logic bombs. Researchers can exploit this dataset as ground truth to assess their approaches and provide comparisons against other tools

arXiv.org e-Print Archive

Open Repository and Bibliography - Luxembourg

A First Look at Android Applications in Google Play related to Covid-19

Author: Allix Kevin
Bissyande Tegawendé François D Assise
Klein Jacques
Samhi Jordan
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2021
Field of study

Due to the convenience of access-on-demand to information and business solutions, mobile apps have become an important asset in the digital world. In the context of the Covid-19 pandemic, app developers have joined the response effort in various ways by releasing apps that target different user bases (e.g., all citizens or journalists), offer different services (e.g., location tracking or diagnostic-aid), provide generic or specialized information, etc. While many apps have raised some concerns by spreading misinformation or even malware, the literature does not yet provide a clear landscape of the different apps that were developed. In this study, we focus on the Android ecosystem and investigate Covid-related Android apps. In a best-effort scenario, we attempt to systematically identify all relevant apps and study their characteristics with the objective to provide a First taxonomy of Covid related apps, broadening the relevance beyond the implementation of contact tracing. Overall, our study yields a number of empirical insights that contribute to enlarge the knowledge on Covid-related apps: (1) Developer communities contributed rapidly to the Covid-19, with dedicated apps released as early as January 2020; (2) Covid-related apps deliver digital tools to users (e.g., health diaries), serve to broadcast information to users (e.g., spread statistics), and collect data from users (e.g., for tracing); (3) Covid-related apps are less complex than standard apps; (4) they generally do not seem to leak sensitive data; (5) in the majority of cases, Covid-related apps are released by entities with past experience on the market, mostly official government entities or public health organizations

Open Repository and Bibliography - Luxembourg

Sensitive and Personal Data: What Exactly Are You Talking About?

Author: Arzt Steven
Bissyande Tegawendé François D Assise
Klein Jacques
Kober Maria
Samhi Jordan
Publication venue
Publication date: 01/05/2023
Field of study

Mobile devices are pervasively used for a variety of tasks, including the processing of sensitive data in mobile apps. While in most cases access to this data is legitimate, malware often targets sensitive data and even benign apps collect more data than necessary for their task. Therefore, researchers have proposed several frameworks to detect and track the use of sensitive data in apps, so as to disclose and prevent unauthorized access and data leakage. Unfortunately, a review of the literature reveals a lack of consensus on what sensitive data is in the context of technical frameworks like Android. Authors either provide an intuitive definition or an ad-hoc definition, derive their definition from the Android permission model, or rely on previous research papers which do or do not give a definition of sensitive data. In this paper, we provide an overview of existing definitions of sensitive data in literature and legal frameworks. We further provide a sound definition of sensitive data derived from the definition of personal data of several legal frameworks. To help the scientific community further advance in this field, we publicly provide a list of sensitive sources from the Android framework, thus starting a community project leading to a complete list of sensitive API methods across different frameworks and programming languages

Open Repository and Bibliography - Luxembourg

Negative Results of Fusing Code and Documentation for Learning to Accurately Identify Sensitive Source and Sink Methods An Application to the Android Framework for Data Leak Detection

Author: Arzt Steven
Bissyande Tegawendé François D Assise
Kabore Abdoul Kader
Klein Jacques
Kober Kober
Samhi Jordan
Publication venue
Publication date: 01/03/2023
Field of study

Almost two-thirds of the population owns a mobile phone. Given that there is a profusion of mobile applications that manipulate all sorts of data, privacy-related concerns arise more and more. New regulations such as the General Data Protection Regulation (GDPR) provide rules for which developers must comply when their apps process sensitive and/or private data. Ensuring that no such data is leaked without the consent of the user is a primary objective in each GDPR compliance check. Researchers have proposed sophisticated approaches to track sensitive data within mobile apps, all of which rely on specific lists of sensitive source and sink methods. The data flow analysis results greatly depend on these lists' quality. Previous approaches either used incomplete hand-written lists and quickly became outdated or relied on machine learning. The latter, however, leads to numerous false positives, as we show. This paper introduces CoDoC that aims to revive the machine-learning approach to precisely identify the privacy-related source and sink API methods. In contrast to previous approaches, CoDoC uses deep learning techniques and combines the source code with the documentation of API methods. Firstly, we propose novel definitions that clarify the concepts of taint analysis, source, and sink methods. Secondly, based on these definitions, we build a new ground truth of Android methods representing sensitive source, sink, and neither methods that will be used to train our classifier. We evaluate CoDoC and show that, on our validation dataset, it achieves a precision, recall, and F1 score of 91%, outperforming the state-of-the-art SuSi. However, similarly to existing tools, we show that in the wild, i.e., with unseen data, CoDoC performs poorly and generates many false-positive results. Our findings suggest that machine-learning models for abstract concepts such as privacy fail in practice despite good lab results. To encourage future research, we release all our artifacts to the community

Open Repository and Bibliography - Luxembourg